Self-organization in social tagging systems 
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Individuals often imitate each other to fall into the typical group, leading to a self-organized state 
of typical behaviors in a community. In this paper, we model self-organization in social tagging 
systems and illustrate the underlying interaction and dynamics. Specifically, we introduce a model 
in which individuals adjust their own tagging tendency to imitate the average tagging tendency. We 
found that when users are of low confidence, they tend to imitate others and lead to a self-organized 
state with active tagging. On the other hand, when users are of high confidence and are stubborn 
for changes, tagging becomes inactive. We observe a phase transition at a critical level of user 
confidence when the system changes from one regime to the other. The distributions of post length 
obtained from the model are compared to real data which show good agreements. 

PACS numbers: 89.65.-s, 89.20.Hh, 05.65. -|-b 



I. INTRODUCTION 

Self-organization is an interesting phenomenon ob- 
served in various areas including network growth [l| , traf- 
fic jams [2| and resource allocation 0]. In social systems, 
individuals often imitate each other through interaction 
and observation, to become more typical in the commu- 
nity. Such dynamics results in a steady state in which 
most individuals adopt the typical practice by learning 
from each other. In online communities, self-organization 
is further facilitated by the recent advent of Web 2.0 so- 
cial applications, which encourage Internet users to in- 
teract with peers. By interacting with each other, users 
self-organize and lead to a state of typical behaviors. 

In resource sharing applications, tags are practical to 
facilitate the search and management of resources 0, . 
Tags are usually simple labels and annotations which help 
users to have preliminary understanding of the content 
before collecting the resources. Recently, tagging systems 
are implemented in popular applications including deli- 
cious, com, jiickr.com and citeulike.org. To well organize 
their resources, users assign tags with their bookmark, 
pictures or Bibtex files. By browsing through tags, users 
are able to find other users who share similar interests. 
Tags thus reflect user behaviors and preferences, and with 
which ones can easily search, collaborate and form com- 
munities with others @. 

Tagging systems are studied extensively in recent 
years, but the underlying interaction and dynamics 
among tag users arc still unclear. Mathematically, tag- 
ging systems are composed of fundamental units of user- 
resource-tag triples [a S 1 ea.ch tagging action con- 



stitutes one or several hyper-links in a tripartite graph. 
Such user-resource-tag relations are often referred to as 
folksonomy. Examples include the use of keywords or 
PACS numbers in academic papers, which also helps to 
reveal the structure of citation networks However, 
how similar papers influence each other on the choice of 
keywords is still an open question. To reveal the tag- 
ging dynamics, Cattuto et al [ll| suggested to consider 
the process of social annotation as a collective yet unco- 
ordinated exploration of the underlying semantic space 
through a series of random walks. In Ref. Lam- 
biottc et al modeled folksonomy in terms of tripartite 
graphs. Zhang and Liu [l^ proposed a model to explain 
some statistical properties in folksonomy, in which users 
can search for resources via tags. Many of these studies 
consider individual tag assignment, while ignoring the 
interaction among peer tag users. 

In this paper, we propose a model to investigate the 
dynamics and interaction among individuals in a tagging 
system. Specifically, individuals imitate each other in 
tagging which results in a self-organized state. We found 
that when users are of low confidence, they self-organize 
to attain a steady state of active tagging. On the other 
hand, the system ends with inactive tagging when users 
are confident of their own tagging practice. In addition, 
a phase transition is observed with a critical level of user 
confidence, when the system changes from one regime 
to the other. Furthermore, we compare distributions of 
post length from the proposed model to two real datasets 
obtained from delicious, com and flickr. com, which show 
good agreements. 



II. MODEL 
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We consider a model of tagging system with A'' users. 
At each step, each user posts one resource and assign 
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tags to the resource. The tendency of which the user 
assigns tags is characterized by Pi (t) , which is the prob- 
ability that the user continues with tag assignment for 
the resource. In other words, the probabihty that user i 
assigns ni(t) tags at time t is given by 



Pr[nm ^ n ^ pirn ~ P^it)h 



(1) 



where I = 1,2,3,.... Large Pi{t) corresponds to a high 
tendency to assign tags and vice versa. We thus cah pi (t) 
the tagivity, which characterizes the tendency of user i 
in tag assignment. Given that Pi{t) remains unchanged, 
rii (t) fohows a geometric distribution with parameter 1 — 
Pi{t). We model the self-organization of user by assuming 
that users adjust their Pi{t) based on the observation of 
{p{t)), the average tagivity over all users at time t. 

As one main purpose for tagging is to facilitate the 
search of resources for others, users would tend to adopt 
a more typical tagging practice. They thus adjust their 
own tagivity in order to imitate the observed average ta- 
givity over users. We denote the combination of tags 
associated with a resource to be a post. Based on ob- 
servations, users obtain an estimated distribution of post 
length, which is the number of tags associated with each 
post. We assume that the users estimate the distribution 
based on the average user tagivity, as given by 



FT[i' = i] = {p{t)y[i-{p{t))], 



(2) 



where /' corresponds to the observed post length. With 
this distribution in mind, user i randomly picks a post 
and imitates its length in the next step. Suppose user i 
assigns ni{t) tags at time t, the probability that he/she 
picks a post of length I' less than rii (t) is given by 



Pr[r <n,(i)] = l-(p(t))"-«-i. 



(3) 



On the other hand, the probability that user i picks a 
post of length V larger than ni{t) is given by 



Pr[r >n,(i)]-(p(i))"-(*). 



(4) 



With probability - {p{t)}), user i picks a 

post of length equals to his/her own post length at time 
t. 

Users imitate the post they pick up by changing their 
tagivities. For instance, user i increases his/her tagivities 
if ni{t) is smaller than /', and vice versa. We denote 
the probabilities of which user i increases, maintains or 
decreases his/her tagivity as T]^{t), i]i{t) and 'r]~{t), given 

by 



vtit) = 



(l-/3)(p(t))"-W 

Z^{t) 



(5) 



where Zi{t) ensures T]^{t) + rfl{t) -\- r]~{t) = 1. The pa- 
rameter /? G [0, 1] can be considered as the confidence of 
user on his own tagivity: (5 = corresponds to the case 
with unconfident users who tend to change their choice of 
tagivities every time step, and (3 = 1 corresponds to the 
case with confident users who stay with their tagivities 
every time step. Increasing (3 from to 1 characterizes 
the increase in user confidence, such that users are more 
reluctant to changes. 

We propose two response functions based on which the 
tagivity is updated. In the first case, the tagivity is up- 
dated linearly by 



p^{t + l)=p,{t)+ai{t)5i, 



(8) 



where ai{t) = 1, 0, —1 respectively with probabilities 
rif{t), rfl{t), Vi^{t), and Si > is a parameter which 
characterizes the extent the tagivity is changed. When 
ai{t) = 1 or —1, the tagivity increases or decreases. The 
parameter Si can be interpreted as the adaptability of the 
users. Large Si corresponds to faster adaptation to the 
typical behaviors. 

In the second case, the complementary tagivity l—pi{t) 
is updated multiplicatively by 



(9) 



where Sm > serves the same role as Si in linear up- 
date. A more explicit implication of this multiplicative 
updating can be obtained by the relation E[ni{t)] = 
{1—Pi{t))~^ , where E[ni{t)] is the expected value of ni{t) 
based on the geometric distribution. Equation ^ thus 
implies 



E[n,{t + l)]=E[n,itm + Smr^'^. 



(10) 



In other words, the expected value of ni(t) respectively 
increases by a factor oi {1 + Sm) , remains unchanged or 
decreases by a factor of (1 -I- Sm)~^ with ai{t) = 1,0 — 1. 



III. SIMULATION RESULTS 

To reveal the dynamics underlying self-organization in 
the model, we conduct numerical simulations. We start 
with random initial Pi(0) for all users. At time t, ni(t) 
is drawn according to the probabilities in Eq. ([1]), such 
that rj^ , rfl and rj^ are evaluated according to Eqs. ([5])- 
([7]). The tagivity Pi{t) for each user is then updated 
according to Eq. ([5]) in the case of linear update or Eq. © 
in the case of multiplicative update. Unless specified, 
the results are obtained when the system converges, i.e. 

becomes steady. We observed that {p{t)) has a 
slight fluctuation around a time average value and the 
fluctuation is dependent on Si and Sm- 



A. Convergence time 

We flrst study the relation between the convergence 
time and the parameters /3, Si and Sm- The self-organized 
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state in our context corresponds to the state in which 
{p{t)) becomes steady. The convergence time r is thus 
defined by the relation (p(t)) w {p{t + L)), for alH > r 
and some sufficiently large L. 

The convergence time is plotted in Fig. [TJa) as a func- 
tion of confidence /3. As similar results are obtained from 
the two update rules, we present only the results obtained 
from the linear update. As shown in Fig.flja), the larger 
the adaptability 6i, the faster the convergence time. The 
prominent peaks of convergence time observed at (3 « 0.5 
suggest the possibility of a phase transition at /3 0.5 as 
dynamics slows down. Furthermore, peak positions are 
similar at different values of Si. It implies that, when the 
weight /? in Eqs. (HI - ([7]) to modify tagivity is equal to 
that to maintain tagivity, the users arc confused and the 
self-organization slows down. As the convergence time is 
also dependent on system size, we plot in log-log scale 
as a function of r at /3 = 0.5 in Fig. [TJb), as stud- 
ies [iH suggest a conventional scaling of In r oc In 
in the proximity of phase transition. These results sug- 
gest that on top of the self-organization, there is a phase 
transition in the range close to /3 = 0.5. 





FIG. 1: (a) Convergence time r as a function of /3 for different 
Si. Convergence time peaks around /3 = 0.5. The larger the 
adaptability 5i, the more quickly the system reaches steady 
state, (b) Convergence time as a function of A'^ when /3 = 0.5, 
which show scattering of data round the straight line implying 
Inr cx In A. 



FIG. 2: The tagivity distributions with (a) the linear update 
and (b) the multiplicative update. Parameters: /3 — 0.45 and 
Si — 0.05 for linear update and /3 = 0.4 and Sm =0.1 for 
multiplicative update. Fittings: (a) Gaussian fit with jj, = 
0.63 and a = 0.088 and (b) log-normal fitting with jj, = —1.41 
and a = 0.27. 



B. Steady distributions of tagivity 

As mentioned in Sec. [Ill each user at each step ran- 
domly picks a post and imitates its length, their tagivi- 
ties thus fluctuate around the average values. We show in 
Fig- m the stable distribution of tagivity after the system 
converges. Figure HJa) shows that the stable distribu- 
tion from linear update resembles Gaussian distribution. 
The simulation results are obtained by /3 0.45 and 
5i ~ 0.05 and the parameter of Gaussian fit are fj, = 0.63 
and cr = 0.088. The results are not as obvious as Eq. ([5]) 
suggests in the case when ai(t) is a random variable. In 
such case, X]t=i would result in an infinite variance 
of Pi (t) , as compared to the finite variance observed in 
Fig. mja). The finite variance of Pi{t) comes from the 
restoring process of ai(t) around the typical behaviors, as 
given by the probabilities in Eqs. ([5]) - ([7]). Figure [2jb) 
shows the stable distribution of tagivity obtained from 
the multiplicative update, where 1 — pi approximately 
follows the log-normal distribution. Simulation results 
are obtained by /3 = 0.4 and Sm = 0.1, with log-normal 
fitting of fi ~ —1.41 and a = 0.27. The origin of the 
log-normal distribution is similar to that of Gaussian dis- 
tribution and can be seen by taking algorithm of Eq. ^ . 
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IV. PHASE TRANSITION AND 
SELF-ORGANIZATION 

Though analytic solutions for the general case are dif- 
ficult to obtain, we can write down a simple description 
of the steady state when 6i ^ or 6m ^ 0. In this case 
we assume pi {p) for all user i. We further introduce 
a quantity A which characterizes the tendency for (p) to 
increase or decrease, as given by 

A((P» = - {p))bfin,/3)-if{n,l3)], (11) 

n=l 

A describes the difference between 77+ and 77^ when the 
average user tagivity is (p) . A positive A corresponds to 
a tendency for (p) to increase, and vice versa. Substitu- 
tions of Eqs. (O and ([7]) for 77+ and 77^ into Eq. pT|) lead 
to the following expression 



(p)"-i(l-(p))((p)"-l + (p)"-i) 

Z(77,/?) 



(12) 



A((p)) = E 

n=l 



We numerically evaluate the summation in Eq. ()12p and 
obtain the values of {p) when A = 0, i.e. when there is no 
tendency for (p) to increase or decrease and the system 
becomes steady. 



a phase transition at /3 « 0.5 from a regime with active 
tag assignment to one with inactive tag assignment. It 
is also interesting to note that when /3 = 0.5, Z = 1 for 
all n in Eq. ([T^ such that A = is guaranteed by the 
identity 



5^(p)"-i(i-(p)"-i-(p)")^o 



(13) 



n=l 



for all values of (p). It implies that at the critical point 
of /3 = 0.5, the system does not have a unique fixed point 
of (p), unlike the cases with /3 7^ 0.5. 



0.8 



0.6 



0.4 



0.2 







numerical 

5=0.0001 






5^=0.0/ 






5=0.05 






5=0.7 


Active Tagging \\ 


Inactive Tagging 



0.25 



0.5 





0.75 




FIG. 3: (Color online) A as a function of (p) for different 
values of /3. 

Figure [3] shows A as a function of (p) for different /?. 
These results imply that for all /3, (p) = 0, 1 are solu- 
tions of A = 0. The fixed points of (p) = or (p) = 1 
respectively correspond to the cases when all users in the 
system stop active tagging or assign infinite number of 
tags. When /? < 0.5, wc get A > for all (p), which 
implies that the tendency to increase tagivity is higher 
than that to decrease tagivity, leading to a stable fixed 
point at (p) = 1. On the other hand, when /3 > 0.5, we 
get A < which implies that the tendency for the tagiv- 
ity to decrease is larger than that to increase, leading to 
an opposite result of stable fixed point at (p) = 0. This 
drastic change of the self-organized state corresponds to 



FIG. 4: (Color online) The average tagivity (p) as a function 
of /3 for various Si. The analytical results at 5; ^ is shown 
by the green line. 

These analytical predictions of (p) with (5; = are 
compared to simulation results with 5i > 0. As the re- 
sults obtained from the two update rules are similar, we 
present only the results obtained from the linear update. 
The green line in Fig. |4] shows the analytical stable fixed 
points of (p) with 6 = 0. We find that simulations with 
small 61 agrees well with the analytical limit, and for 
61 > 0, (p) decreases with increasing /3 as well as increas- 
ing 61. As wc can sec, all the simulation results show an 
abrupt change in (p) at /3 ~ 0.5, suggesting the existence 
of a phase transition as predicted by the analytical re- 
sults. We remark that (3 = 0.5 corresponds to the case in 
Eqs. ([5]) - (I?!) where the weight to imitate others equal to 
that to stay unchanged. These results imply that when 
users have low confidence, they tend to imitate each other 
in tagging which leads to a steady state of active tag as- 
signment. However, when users are confident and are 
stubborn for changes, they stay with their own practice 
and result in a steady state with inactive tagging. These 
two behaviors are connected by an abrupt change when 
confidence increases across /3 = 0.5. 

To show explicitly how users self-organize to attain 
the steady state, we start the system at the unstable 
fixed point and examine how it evolves to the stable fixed 
point after a slight perturbation. The black line in Fig. El 
corresponds to the average tagivity for the case when 
confident users (i.e. /3 < 0.5) are initialized with zero 
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FIG. 5: (Color online) The dynamics of which the self- 
organization is established. Black line: pi(0) = for all users, 
and at time to one user assigns more than one tags. Red line: 
Pi{Q) — 1 for all users, and at time to one user assigns only 
one tag. 



tagivity. At time tg, one of the users assigns a tag which 
initiates others to imitate. As we can see, the average ta- 
givity slowly increases after to and saturates at a non-zero 
steady value, correspond to the self-organization from in- 
active to active tagging. On the contrary, the red line 
shows the case when users are initialized with Pi(0) = 1 
and large confidence (i.e. /3 > 0.5). A maximum post 
length is set to avoid infinite tagging. At time to, one 
user assigns the minimum number of tags which initiates 
others to imitate. As we can see, the average tagivity 
slowly decreases after to and becomes steady at zero, cor- 
responds to the self-organization from active to inactive 
tagging. 



EMPIRICAL RESULTS 
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FIG. 6: (Color online) Empirical distribution of the num- 
ber of tags in each post as compared to simulations. Circles 
represent empirical data, blue solid lines and red dash lines 
respectively represent simulation results with linear and mul- 
tiplicative update, (a) Data from delicious.com compared to 
simulations with parameters /? — 0.45 and 5i = 0.08 for linear 
update and /3 = 0.45 and Sm = 0.1 for multiplicative update, 
(b) Data from flickr.com compared to simulations with pa- 
rameters P = 0.4, Si = 0.06 for linear update and P = 0.4, 
5m = 0.2 for multiplicative update. 



As it is difficult to define and obtain the tagivity for 
real users, other well-defined quantities are used for com- 
parison. We compare the distributions of post length ob- 
tained from the model with two real datasets: (1) deli- 
cious, com, a social bookmarking website for saving, shar- 
ing and discovering bookmarks associated with tags; (2) 
flickr.com, an image hosting website which encourages 
users to organize their pictures with tags. 

We show in Fig. [6] (a) and (b) the distributions of the 
post length (as open circles) obtained respectively from 
delicious, com and flickr. com. The posts without tags are 
removed from the statistics. It is interesting to note that 
the two distributions display similar behaviors: an initial 
fast decay with post length less than 8, followed by a 
power-law decay for intermediate post length, and then 
a high tail. The exponents of the power law decay are 
4.1 and 4.3 respectively in delicious.com and flickr.com, 
with average post length approximately 2.9 and 3.4. The 
simulated distributions are plotted in Fig. [5] as blue and 
red lines respectively for linear and multiplicative update. 



all with /? < 0.5. These results may suggest that real 
users are of low confidence and tend to imitate each other 
in tag assignment. 

As we can see, the simulation results based on the lin- 
ear update have better agreement with empirical data 
than that of the multiplicative update. With the lin- 
ear update, the high tails of empirical data are also well 
fitted. According to Fig. [51 the tagivity distribution ob- 
tained from the linear update shows a slower decay at 
large p, as compared to the faster decay in the multi- 
plicative case. The slow decay at large p, i.e. more users 
are found with large tagging tendency, may explain the 
high tail in the post length distributions. 
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VI. CONCLUSIONS AND DISCUSSION 

111 this paper, we proposed a model to illustrate the 
self-organization of tagging behaviors in social tagging 
systems, where individuals imitate each other in tag as- 
signment and eventually result in a self-organized state. 
With linear update on the tagging tendency, namely ta- 
givity, the corresponding steady distribution resembles 
Gaussian distribution. On the other hand, the steady dis- 
tribution resembles log-normal distribution when multi- 
plicative update is employed. In addition, we found that 
when users are of low confidence, they tend to imitate 
others and the system ends with a steady state of active 
tagging. By contrast, when users are of high confidence, 
the system will reach a steady state of inactive tagging. 
Abrupt changes are observed when user confidence in- 
creases and the system changes from one regime to the 
other, suggesting a phase transition separating the ac- 
tive and inactive tagging. Analyses on convergence time 
suggest a slow dynamics around the parameter range of 
phase changes, which provides further evidence for the 
transition. Finally, the post length distributions of the 
model are compared to two real datasets obtained from 
delicious.com and flickr.com, which show good agree- 
ments. 

Social tagging systems have been studied with ap- 



proaches ranging from graph theory to statistics, which 
may overlook the interactions and dynamics among in- 
dividuals. The present model introduced in this paper 
provides a simple yet interesting description of evolving 
social tagging systems, which might be generalized to 
other systems where self-organizations are observed. The 
proposed model may also shed light on applications (e.g. 
rccommender systems [l6l [l7j ) which combine statisti- 
cal physics and agent-based models [l^ in understanding 
tagging systems as well as other social systems [Toj . 
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